# Multilingual speech recognition
Ipa Whisper Base
Apache-2.0
A multilingual speech recognition model fine-tuned based on Whisper-base, supporting International Phonetic Alphabet (IPA) output
Speech Recognition
Safetensors Supports Multiple Languages
I
neurlang
599
6
Canary 1b Flash
NVIDIA NeMo Canary Flash is a family of multilingual multitask models that achieves state-of-the-art performance across multiple speech benchmarks. Supports automatic speech recognition and translation tasks in four languages.
Speech Recognition Supports Multiple Languages
C
nvidia
125.22k
186
Faster Whisper Large V3 Turbo Int8 Ct2
MIT
This is the CTranslate2 converted version of OpenAI's Whisper-large-v3-turbo model, employing INT8 quantization technology, primarily designed for efficient speech recognition tasks.
Speech Recognition Supports Multiple Languages
F
Zoont
123
4
Mahadhwani Pretrained Conformer
MIT
A pre-trained Conformer encoder model based on self-supervised learning, supporting automatic speech recognition tasks for 22 scheduled Indian languages.
Speech Recognition
M
ai4bharat
349
1
Whisper Large V3 Distil Multi7 V0.2
MIT
A distilled multilingual Whisper model supporting automatic speech recognition for 7 European languages with code-switching capability
Speech Recognition
Transformers Supports Multiple Languages

W
bofenghuang
119
1
Whisper Large V3 Turbo
Apache-2.0
Whisper large-v3-turbo is a distilled version of OpenAI Whisper large-v3, with the decoder layers reduced from 32 to 4, significantly improving speed while slightly reducing quality.
Speech Recognition Supports Multiple Languages
W
deepdml
883
6
Whisperfile
Apache-2.0
Whisper is a Transformer-based encoder-decoder model for speech recognition and translation tasks, supporting multilingual processing.
Speech Recognition
W
cjpais
353
9
Whisper Small Uz En Ru Lang Id
Apache-2.0
A fine-tuned multilingual speech classification model based on Whisper-small, supporting speech recognition and classification for Uzbek, English, and Russian.
Audio Classification
Transformers Supports Multiple Languages

W
fitlemon
17
1
Owsm Ctc V3.1 1B
OWSM-CTC is an encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC, supporting multilingual speech recognition, speech translation, and language identification.
Speech Recognition Other
O
espnet
116
13
Whisper Large V3 Japanese 4k Steps Ct2
MIT
This is a CTranslate2 converted version of the OpenAI Whisper large-v3 model, specifically fine-tuned for Japanese with an additional 4,000 training steps, supporting multilingual speech recognition.
Speech Recognition Supports Multiple Languages
W
JhonVanced
54
4
Canary 1b
Canary-1B is a multilingual multi-task model developed by NVIDIA NeMo, supporting automatic speech recognition and speech translation tasks in English, German, French, and Spanish.
Speech Recognition Supports Multiple Languages
C
nvidia
7,734
421
Whisper Large V3 Ft Cv16 Mn
Apache-2.0
A speech recognition model fine-tuned on the Common Voice 16.0 dataset based on OpenAI Whisper Large V3
Speech Recognition
Transformers

W
sanchit-gandhi
34
1
Multilingual Distilwhisper 28k
MIT
An improved multilingual automatic speech recognition model based on whisper-small, enhancing target language performance through CLSR module and knowledge distillation
Speech Recognition
Transformers Other

M
naver
47
13
Faster Whisper Tiny
MIT
CTranslate2 converted version of OpenAI Whisper tiny model for efficient speech recognition
Speech Recognition Supports Multiple Languages
F
Systran
875.91k
10
Faster Whisper Base
MIT
This is the CTranslate2 converted version of OpenAI's Whisper base model, designed for efficient speech recognition tasks.
Speech Recognition Supports Multiple Languages
F
Systran
1.1M
13
Faster Whisper Medium
MIT
This is the CTranslate2 converted version of OpenAI's Whisper medium model, designed for efficient speech recognition tasks.
Speech Recognition Supports Multiple Languages
F
Systran
155.87k
29
Faster Whisper Large V3
MIT
Whisper large-v3 is a large-scale multilingual automatic speech recognition (ASR) model developed by OpenAI, supporting speech-to-text tasks in multiple languages.
Speech Recognition Supports Multiple Languages
F
Systran
713.48k
376
Whisper Large V3
Apache-2.0
Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
4.6M
4,321
Lang Id Voxlingua107 Ecapa
Apache-2.0
ECAPA-TDNN based spoken language identification model trained on VoxLingua107 dataset, supporting classification of 107 languages
Audio Classification Supports Multiple Languages
L
apenasissso
19
0
Faster Whisper Large V1
MIT
This is the CTranslate2 converted version of the OpenAI Whisper large-v1 model for efficient speech recognition tasks
Speech Recognition Supports Multiple Languages
F
guillaumekln
237
4
Faster Whisper Large V2
MIT
This is the CTranslate2 converted version of OpenAI Whisper large-v2 model for efficient speech recognition
Speech Recognition Supports Multiple Languages
F
guillaumekln
161.19k
199
Faster Whisper Medium
MIT
This project converts the openai/whisper-medium model to the CTranslate2 model format, which can be used for efficient speech recognition.
Speech Recognition Supports Multiple Languages
F
guillaumekln
15.17k
33
Faster Whisper Base
MIT
The Whisper base model is an Automatic Speech Recognition (ASR) model developed by OpenAI, supporting speech-to-text tasks in multiple languages.
Speech Recognition Supports Multiple Languages
F
guillaumekln
8,493
10
Whisper Large V2 Slovenian
Apache-2.0
This model is a speech recognition model fine-tuned on the Common Voice 11.0 Slovenian dataset based on OpenAI's Whisper Large-V2 model, with a word error rate of 13.83%.
Speech Recognition
Transformers Other

W
DrishtiSharma
53
1
Whisper Large V2
Apache-2.0
Whisper is a pre-trained automatic speech recognition (ASR) and speech translation model, trained on 680,000 hours of labeled data with strong generalization capabilities
Speech Recognition Supports Multiple Languages
W
openai
176.55k
1,725
Wav2vec2 Xls R 300m Mixed
A speech recognition model fine-tuned on mixed-language datasets based on Facebook's wav2vec2-xls-r-300m model, supporting Malay, Singaporean English, and Mandarin.
Speech Recognition
Transformers

W
mesolitica
10.07k
4
Xlsr Wav2vec2 2
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-large-xlsr-53, supporting multilingual speech-to-text tasks
Speech Recognition
Transformers

X
chrisvinsen
20
0
Aspram
Apache-2.0
Armenian automatic speech recognition model based on wav2vec2-xls-r-2b architecture, supporting hy/hye language
Speech Recognition
Transformers Other

A
YSU
170
4
Test Audio
MIT
A Transformer-based end-to-end speech translation model specifically designed for French-to-English speech translation tasks.
Speech Recognition
Transformers Supports Multiple Languages

T
joaogante
19
0
Xtreme S Xlsr 300m Fleurs Langid
Apache-2.0
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the GOOGLE/XTREME_S - FLEURS.ALL dataset for multilingual speech recognition tasks.
Audio Classification
Transformers Other

X
anton-l
17
0
Xtreme S Xlsr 300m Minds14
Apache-2.0
A multilingual speech recognition model fine-tuned on the GOOGLE/XTREME_S - MINDS14.ALL dataset based on facebook/wav2vec2-xls-r-300m
Audio Classification
Transformers Other

X
anton-l
467
2
Xtreme S Xlsr Mls Upd
Apache-2.0
A Polish speech recognition model fine-tuned on the GOOGLE/XTREME_S - MLS.PL dataset based on facebook/wav2vec2-xls-r-300m
Speech Recognition
Transformers Other

X
anton-l
16
0
Wav2vec2 Base 10k Voxpopuli
A foundational speech recognition model pretrained on 10,000 hours of unlabeled data from the VoxPopuli corpus, supporting multilingual speech processing
Speech Recognition
Transformers Other

W
facebook
2,504
0
Wav2vec2 Large Xlsr 53 Demo Colab
Apache-2.0
This model is a speech recognition model fine-tuned on the common_voice dataset based on facebook/wav2vec2-large-xlsr-53, primarily used for robust speech event recognition.
Speech Recognition
Transformers

W
emre
16
0
Wav2vec2 Large Mt Voxpopuli V2
Facebook's Wav2Vec2 large model, pretrained exclusively on unlabeled data from the VoxPopuli corpus for Maltese (mt), suitable for speech recognition tasks.
Speech Recognition
Transformers Other

W
facebook
25
0
Wav2vec2 Base 100k Voxpopuli
A speech recognition base model pretrained on 100,000 hours of unannotated data from the VoxPopuli corpus
Speech Recognition
Transformers Other

W
facebook
148
4
Wav2vec2 Large 100k Voxpopuli
A speech recognition model pre-trained on 100,000 hours of unlabeled data from the VoxPopuli corpus, supporting multilingual speech representation learning
Speech Recognition Other
W
facebook
2,218
4
Wav2vec2 Large Xlsr 53 Demo Colab
Apache-2.0
This is an automatic speech recognition model based on the wav2vec2 architecture, specifically optimized for the Tamil language and supporting Nepali speech recognition tasks.
Speech Recognition
Transformers Other

W
Mahalakshmi
17
0
Wav2vec2 Pretrained Clsril 23 10k
An audio pre-training model based on self-supervised learning, capable of learning cross-lingual speech representations from raw audio of 23 Indian languages
Speech Recognition
Transformers

W
Harveenchadha
32
5
Asr Voxrex Bart Base
This is an automatic speech recognition model based on a sequence-to-sequence architecture, capable of converting speech into text.
Speech Recognition
Transformers

A
KBLab
28
0
Featured Recommended AI Models